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Abstract 

The eviction problem for memory hierarchies is studied for the Hidden Markov Ref- 
erence Model (HMRM) of the memory trace, showing how miss minimization can be 
naturally formulated in the optimal control setting. In addition to the traditional version 
assuming a buffer of fixed capacity, a relaxed version is also considered, in which buffer 
occupancy can vary and its average is constrained. Resorting to multiobjective optimiza- 
tion, viewing occupancy as a cost rather than as a constraint, the optimal eviction policy 
is obtained by composing solutions for the individual addressable items. 

This approach is then specialized to the Least Recently Used Stack Model (LRUSM), 
a type of HMRM often considered for traces, which includes V — 1 parameters, where V 
is the size of the virtual space. A gain optimal policy for any target average occupancy is 
obtained which (i) is computable in time 0{V) from the model parameters, (ii) is optimal 
also for the fixed capacity case, and (iii) is characterized in terms of priorities, with the 
name of Least Profit Rate (LPR) policy. An O(logC) upper bound (being C the buffer 
capacity) is derived for the ratio between the expected miss rate of LPR and that of 
OPT, the optimal off-line policy; the upper bound is tightened to 0(1), under reasonable 
constraints on the LRUSM parameters. Using the stack-distance framework, an algorithm 
is developed to compute the number of misses incurred by LPR on a given input trace, 
simultaneously for all buffer capacities, in time 0{logV) per access. 

Finally, some results are provided for miss minimization over a finite horizon and over 
an infinite horizon under bias optimality, a criterion more stringent than gain optiniality. 

Keywords: Eviction policies. Paging, Online problems, Algorithms and data structures, 
Markov chains, Optimal control, Multiobjective optimization. 

1 Introduction to Eviction Policies for the Memory Hierarchy 

The storage of most computer systems is organized as a hierarchy of levels (currently, half a 
dozen), for technological and economical reasons [27] as well as due to fundamental physical 
constraints |13| . Memory hierarchies have been investigated extensively, in terms of hardware 
organization \22 \ H3 l [27]. operating systems |45) . compiler optimization [3ll55l[26], models of 
computation [44] . and algorithm design [1]. A central issue is the decision of which data to keep 
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Figure 1: Two-level memory hierarchy. 

in which level. It is customary to focus on two levels (see Fig. [T]), respectively called here the 
buffer and the backing storage, the extension to multiple levels being generally straightforward. 
(We adopt the neutral names used in the seminal paper of Mattson et al. [36], since the 
concepts introduced have application at each level of the memory hierarchy: register allocation, 
CPU caching, memory paging, web caching, etc.) We assume that an engine generates a 
sequence of access requests oi, 02 . . . , at, . . . for items (equally sized blocks of data) stored 
in the hierarchy. A request is called a hit if the requested item is in the buffer and a miss 
otherwise. Upon a miss, the item must be brought into the buffer, a costly operation. If the 
buffer is full, the requested item will replace another item. We are interested in an eviction 
policy that selects the items to be replaced so as to minimize subsequent misses. 

The MIN policy of Belady [7] and the OPT policij^oi Mattson, Gecsei, Sluts, and Traiger 
|36| . minimize the number of misses in an off-line setting. Since, in practical situations, the 
address trace unfolds with the computation, eviction decisions must be made on-line. Dozens 
of on-line policies have been proposed and implemented in hardware or within operating 
systems. Somewhat schematically, we can say that implemented policies have evolved mostly 
experimentally, by benchmarking plausible proposals against relevant workloads |43j . Most of 
these policies are variants of the Least Recently Used (LRU) policy. Departure from pure LRU 
is motivated to a large extent by its high implementation cost. However, several authors have 
also explored variants of LRU that incur fewer misses, at least on some workloads |37| . 

Theoretical investigations have focused mostly on two objectives: (a) to "explain" the 
practical success of LRU-like policies and (b) to explore the existence of better policies. One 
major question is how to model the input traces (i.e., the lists of requested memory references). 
An interesting perspective, proposed by Koutsoupias and Papadimitriou [31], is to model traces 
by a class of stochastic processes, all to be dealt with by the same policy. For a given stochastic 
model, two metrics help assess the quality of a policy: (i) the expected number of misses and 
(ii) its ratio with the expected number of misses incurred by OPT, called the competitive ratio. 
The competitive ratio of a policy with respect to a class of stochastic processes is defined in a 
worst-case sense, maximizing over the class. 

Competitive analysis was proposed by Sleator and Tar j an |46) for the class of all possible 
traces (all stochastic processes), for which they showed that the competitive ratio of any 
on-line policy is at least the buffer capacity C, a value actually achieved by, e.g., LRU and 
FIFO. While theoretically interesting, this results shed little light on what is experimentally 
known. For example, if we restrict the possible traces to the ones which actually emerge in 

^OPT: Upon a miss, if the buffer contains items that wiil not be accessed in the future then evict (any) one 
of them, else evict the (unique) item whose next access is furthest in the future. 
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practical applications, the miss ratio between LRU and OPT is much smaller than C, being 
typically around 2 and seldom exceeding 4. Borodin et al. [15] restrict the class of possible 
traces to be consistent with an underlying "access graph" that models the program locality 
(nodes correspond to items and legal traces correspond to walks); a heuristic policy based on 
this model has been developed by Fiat and Rosen [,21] and has been shown to perform better 
than LRU, relative to some benchmarks. 

Albers et al. [2] propose a trace restriction based on Denning's working set concept jl7) : 
for a given function /(n), the (average or maximum) number of distinct items referenced in 
n consecutive steps is at most /(n). LRU is proved to be optimal in both the average and 
maximum cases. 

Koutsoupias and Papadimitriou [31j have considered the class (0 < e < 1) of stochastic 
processes such that, given any prefix of the trace, the probability to be accessed next is 
at most e for any item. A combinatorially rich development establishes that LRU achieves 
minimum competitive ratio, for each value of e. A careful analysis of how the competitive ratio 
depends upon both e and C has been provided by Young [58j: in particular, the competitive 
ratio increases with e. Qualitatively speaking, buffering is more efficient exactly when the 
items in the buffer are more likely to be referenced than those outside the buffer. Hence, for 
traces where LRU (or any policy) exhibits good performance (that is, few misses), e must be 
correspondingly high. Then, the competitive ratio is also high, unlike what observed in 
practice. For a more quantitative appraisal of the issue, consider that, at any level of the 
memory hierarchy of real systems, the average miss ratio is typically below 1/4 (even well 
below 1% in main memory). Let then C1/2 be the buffer capacity at which the miss rate 
would be 1/2. It is easy to see that it must be eCi/2 > 1/2. The actual buffer capacity C is 
typically considerably larger than Ci/2i say C > 4:Ci/2, by the rule of thumb that quadrupling 
the cache capacity halves the miss ratio |43) . Then, eC > 2. By the bounds of Young |58| , the 
Ae competitive ratio of LRU is at least C/2, which is not much more informative than the 
value C of |46J. 

To "explain" both the low miss rate and the low competitive ratio of LRU in practical 
cases, the approach of |31) requires some restriction to the class of stochastic processes, in 
order to capture temporal locality of the reference trace, while essentially assigning low the 
probability to those traces for which OPT vastly outperforms LRU. The model and results of 
Becchetti [6j can be viewed as an exploration of this direction. 

Following what could be viewed as an extreme case of the approach outlined above, a 
number of studies have focused on specific stochastic processes (the class is a singleton), with 
the objective of developing (individually) optimal policies for such processes. In [23j . the 
address trace 01,02, . . . ,04, . . . is taken to be a sequence of mutually independent random 
variables, a scenario known as the Independent Reference Model (IRM). While attractive for 
its simplicity, the IRM does not capture the cornerstone property that memory hierarchies 
relay upon: the temporal locality of references. To avoid this drawback, the LRU-Stack Model 
(LRUSM) of the trace |39l H9] has been widely considered in the literature |54| \T9\ [30] and is 
the focus of much attention in this paper. Here, the trace is statistically characterized by the 
probability s{j) of accessing the j-th most recently referenced item. The values of j for different 
accesses are assumed to be statistically independent. Throughout, we assume that the items 
being referenced belong to a finite virtual space of size V. If s is monotonically decreasing, 
LRU is optimal |49) (for an infinite trace); in general, the optimal policy varies with s and can 
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differ from LRU, as shown by Woof, Fernandez and Lang|56| 157). Smaragdakis, Kaplan, and 
Wilson|47| I48j introduced the EELRU pohcy as a heuristic inspired by the LRUSM. The model 
of Becchetti [6] mentioned above is similar to LRUSM, essentially dropping the assumption 
of independence, while restricting the form of the conditional distribution of accessing the 
j-th most recent item, given the value of the past trace. LRU is compared to OPT under this 
model. 

A different generalization of the IRM model is the Markov Reference Model (MRM), 
already suggested by j36) . where the address trace is a finite Markov chain. A wealth of 
results are obtained by Karlin, Phillips, and Raghavan |29) for MRM, including the Commute 
Algorithm, a remarkable policy computable from the transition probabilities of the chain 
in polynomial time, whose expected miss rate is within a constant factor of optimum. We 
underscore that MRM and LRUSM are substantially different models; in general, while the 
MRM trace is itself a Markov process with V states, the LRUSM trace is a function of a 
Markov process with V\ states. 

Paper outline In this work, we further the study of the LRUSM, deriving new results and 
strengthening its understanding. Some methodological aspects of our investigation, however, 
can be of interest for a wider class of models of the reference trace, hence they will be presented 
in a more general context. 

A first methodological aspect we explore is the possibility as well as the fruitfulness of 
casting miss minimization as a problem of optimal control theory (or, equivalently, as a Markov 
Decision Problem [32]). In Section |2| we show as this is possible whenever the trace is a hidden 
Markov process, a scenario which we call the Hidden Markov Reference Model (HMRM). The 
IRM, the MRM, and the LRUSM are ah special cases of the HMRM. We refer to the classical 
optimal control theory framework as presented, for instance, in the textbook of Bertsekas 
In the dynamical system to be controlled, the state encodes the content of the buffer and some 
information on the past trace. The disturbance input models the uncertainty in the address 
trace, while the control input encodes the eviction decisions available to the memory manager. 
The cost per step is one if a miss is incurred and zero otherwise. We modify the standard 
assumption that the control is a function only of the state and allow it to depend on the 
disturbance as well: ut = iJLt{xt,Wt)- This modification is necessary since eviction decisions are 
actually taken with full knowledge of the current access. The Bellman equation characterizing 
the optimal policies has to be modified accordingly. The technicalities of this adaptation are 
dealt with in the Appendix. 

A second methodological aspect we explore is a generalization of the buffer management 
problem where rather than imposing the capacity as a fixed constraint, we let the number of 
buffer positions vary dynamically, under the control of the management policy. The average 
buffer occupancy becomes a second cost, in addition to the miss rate, and the tradeoff between 
the two costs is of interest. This problem could have practical applications. For example, in a 
time-shared environment, processes could be charged for their average use of main memory 
and hence be interested in utilizing more memory in phases where this can result in significant 
page-fault reduction and less memory in other phases. In addition, the study of the average 
occupancy problem sheds significant light even on the solution of the fixed capacity problem. 
In particular, for the LRUSM, an optimal policy for the case of a fixed buffer can be simply 
obtained from optimal policies for the average occupancy case. In Section |3| the average 
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buffer occupancy problem is studied for the HMRM, exploiting techniques of multi-objective 
optimization. A key advantage lies in the possibility of composing a global policy from policies 
tailored to the individual items in the virtual space. Furthermore, the approach naturally lends 
itself to the efhcient management of a buffer shared among different processes. Throughout this 
section, the notion of optimality of the policies under consideration is that of gain optimality 
over an infinite horizon. 

In Section [4j the framework and the results developed in the two previous sections are 
applied to the LRUSM. After reviewing the LRU stack model, a class of optimal policies is 
derived, for an arbitrary distribution s, for the average occupancy problem. It is then shown 
that this class of policies includes some that use a buffer of fixed capacity C. More specifically, 
it is shown that the minimum miss rate under a constraint on the average occupancy can be 
attained with fixed capacity. A buffer of capacity C is optimally managed by a K-L policy 
|56[ 157) . which can be specified by two parameters, denoted as K = K{C) and L = L{C). K-L 
policies include, as special cases, LRU {K = C, VL) and Most Recently Used (MRU) {K = 1, 
L = V). Our derivation of the optimal policy has a number of advantages, (i) The optimality 
is established for the more general setting of average occupancy, (ii) The policy is naturally 
described in terms of a system of priorities for the eviction of items. As a corollary of a result of 
[36] . it follows that the policy does satisfy the inclusion property: if an item is in a given buffer, 
then it is also in all buffers of larger capacity. The inclusion property rules out the so-called 
Belady anomaly [8] and enables more efficient algorithms for its performance evaluation, (iii) 
By linking the eviction priorities to the (planar) convex huU of the Pareto optimal points of 
the average occupancy problem and by adapting Graham's scan, an algorithm is derived for a 
linear time computation of the values K[C) and L{C) for all relevant values of C. (Previously 
known properties of the K-L policy lead to a straightforward cubic algorithm.) (iv) Finally, the 
priorities can be shown to correspond to a suitable notion of profit rate of an item, informally 
capturing the best achievable ratio between expected hits and the expected occupancy for 
that item, leading to the concept of Least Profit Rate (LPR) policy. The LPR policy can be 
defined for models different from the LRUSM for which, while not necessarily optimal, it may 
yield a good heuristic. 

In Section |5] we show that the ratio x between the expected miss rate of the optimal 
on-line policy, LPR, and that of OPT is O(logC). Moreover, for the class of stack access 
distributions s for which the miss rate of LPR is lower bounded by some constant /3 > 1/C, 
we have x < 21n(2//3) G 0(1). 

The ability to efficiently compute the number of misses for buffers of various capacities 
when adopting a given policy for benchmark traces is of key interest in the design of hardware 
as well as software solutions for memory management \12\ |5H |53] . In Section [6] we develop an 
algorithm to compute the LPR misses for all buffer capacities in time 0{logV) per access, 
providing a rather non trivial generalization of an analogous result for LRU O |3] . 

In the reminder of the paper, we explore alternate notions of optimality. In Section [TJ we 
consider optimization over a finite interval, or horizon. Technically, the optimal control problem 
is considerably harder. As an indication, even if the system dynamics, its cost function, and 
the statistics of the disturbance are all time invariant, the optimal control policy is in general 
time-dependent. We show that, for any monotonically non increasing stack distribution s, LRU 
is an optimal policy for any finite horizon, whereas MRU is optimal if the stack distribution 
is non decreasing. While these results appear symmetrical and highly intuitive, their proofs 
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are substantially different and all but straightforward. The standard approach based on the 
Bellman equation, which requires "guessing" the optimal cost as a function of the initial state, 
does not seem applicable, lacking a closed form for such function. We have circumvented this 
obstacle by establishing an inductive invariant on the relative values of the cost for select pairs 
of states. This approach may have applicability to other optimal-control problems. Some of 
the results are derived for a considerably more general version of the LRUSM, which does not 
assume the statistical independence of stack distances at different steps. 

Finally, in Section [8] we take a preliminary look at bias optimality, a property stronger 
than gain optimality, but also considerably more difficult to deal with. As an indication of the 
obstacles to be faced, we prove that, in some simple cases of LRUSM, no bias-optimal policy 
satisfies the useful inclusion property. We also develop a closed-form solution for the simplest 
non-trivial case of buffer capacity, that is, C = 2. The derivation as well as the results are 
not completely straightforward, suggesting that the solution for arbitrary buffer capacity may 
require considerable ingenuity. 

We conclude the paper with a brief discussion of directions for further research. 

2 Optimal Control Formulation of Eviction for the Hidden 
Markov Reference Model 

In the typical problem of optimal control [llj, one is given a dynamical system described by a 
state-transition equation of the form 



where xt is the state at time t, while both ut and wt are inputs, with crucially different roles. 
Input Ut, called the control, can be chosen by whoever operates the system. In contrast input 
Wt, historically called the disturbance, is determined by the environment and modeled as a 
stochastic process. At each step t, a cost is incurred, given by some function g{xt,Ut,Wt)- The 
objective of optimal control is to find a control policy ut = ^t{xt) so as to minimize the total 
cost 



where I is a time interval of interest. A key premise of most optimal control theory is the 
assumption of past- independent disturbances (PID): given the current state xt, the current 
disturbance wt is statistically independent of past disturbances {w-r : r < t}. 

In this section, we define a dynamical system whose optimal control corresponds to the 
minimization of the number of misses when the reference trace can be expressed as a function 
at = r(zf) of a Markov chain zt with a finite state space Z. We call this scenario the Hidden 
Markov Reference Model (HMRM). In the following we assume the system to be unichain, i.e., 
under any stationary policy the Markov chain associated with the system evolution has only a 
single recurrent class. This hypothesis guarantees that the average cost in infinite horizon does 
not depend on the initial state and that there is always a solution to the Bellman equation 
(e.g., a Markov chain with two non communicating classes is not unichain). 



Xt+l = f{xt,Ut,Wt) 





(2) 



6 



To cast eviction as a problem in optimal control, the state of our dynamical system will 
model both the Markov chain underlying the trace and the content of the buffer: 

xt = {zt,ht) , (3) 
where bt G {0, 1}^ is a Boolean vector such that, for j = 1, . . . , y, 

, , , A I 1 7 is in the buffer, , , 

bt{3) = <^ (4) 
I otherwise. 

Toward formulating the transition function governing the evolution of xt, let us first observe 
that any Markov chain can be written as 

zt+i = <i){zt,wt) , (5) 

where Wt € W is a sequence of equally distributed random variables, independent of each 
other and of the initial state zq. Furthermore, Vl^ is a finite set with \W\ < \Z\{\Z\ — 1). We 
take Wt to be the "disturbance" in ([T]). We let the control input ut G {0, 1, . . . , V} encode the 
eviction decisions with denoting no eviction (the only admissible control in case of a hit) 
and j > denoting the eviction of the item j (an admissible control only when a miss occurs 
and the item j is in the buffer, i.e., bt{j) = 1). We can then write: 

bt+i = ip{bt,ut) , (6) 

where the transition function ■0 is specified as 

1 j = at = r{zt) , 
^m(j)= j = ut , (7) 

bt{j) otherwise . 

Finally, we have: 

xt+i = {zt+i,bt+i) = {(p{bt,wt),ilj{bt,ut)) = f{xt,ut,wt) . (8) 
The instantaneous cost function g is simply 

^ I 1 if a miss occurred , 
g{xt,wt)=< . (9) 

I U otherwise . 

Finally, we assume that a policy can set the control ut with knowledge of both the state 
and the disturbance: ut = ^t{xt,Wt). This requires adaptation of some results derived in the 
control theory literature typically assuming ut = ^t{xt)- 

Consider a policy vr = (/ii, ;U2, . • • , /ir) applied to our system during the time interval [1, t], 
so that, for t in this interval, we have 

xt+i = f (xt, fJ.t{xt,wt) ,wt) . (10) 
We define the cost of vr, starting from state xq, with time horizon r as 



j; (xo) = 



T-1 



where w = [vjQjWi, ... ,Wr-i] , (11) 
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where the Xt's are subject to (10) and the expected value averages over disturbances Wt- For 
our system, this is the expected number of misses in r steps. The optimal cost is 



j;(xo) = minj;(xo) . (12) 

The optimal cost satisfies the fohowing dynamic-programming recurrence (analogous to Eq. 1.6 
in Vol. 1 of [lU) 



mm {g{xo,w) + J*^i{f {xo,u,w))} 

u&U(xq,w) 



(13) 



where U{xo,w) denotes the set of controls admissible when the state is xq and the disturbance 
is w. Introducing a vector J* whose components are the values J*{x), in some chosen order of 



the states, (13) can be rewritten as 

j; = Tj;_i , (14) 

where T is the optimal-cost update operator. 

In applications where the temporal horizon of interest is long and perhaps not known a 
priori, one is interested in policies that are optimal over an infinite horizon; an added benefit 
is that such policies are provably stationary, under very mild conditions. Usually, the cost 



defined in (11) diverges as r — )• oo, thus alternate definitions of optimality are considered 



im [32] [5l [H] , like gain and bias optimality. 

Gain Optimality refers to a policy /j,* that achieves the lowest average cost (that can be 
shown to be independent of x): /U* = argmin^ limT-_j.oo Jt{x)/t . 

Bias Optimality refers to a policy fi* that is as good as any other stationary policy for r 
long enough, i.e., Vx limT-_j.oo Jt{x) — Jr (x) > 0. (Note that a bias optimal policy is 
also gain optimal.) 

In this work we mostly concentrate on the standard gain optimality concept (which 
corresponds to the miss rate minimization), but we also give some particular results for the 
stronger concept of bias optimality in Section [8] In Sections [3] and |8] we will use the classical 
Bellman equation, which characterizes optimal control policies in infinite horizon as solutions 
of a fixed point equation |11) . We show in Appendix [A| that the result holds in our model as 
well: 

Proposition 1. Let A be a dynamical system, with state space X, whose control ut can he 
chosen with knowledge of the disturbance wt- If 3X3h : 1 + h = T h, then A is the optimal 
average cost of A and h is the vector of the differential costs of the states, i.e. 

Vx,7/GX lim j;{x) - j;{y) = h{x) - h{y) . (15) 

In the rest of the paper devoted to infinite horizon, since the costs will be independent of 
the initial state xq, we will simplify (and abuse a little) the notation by setting J^^{xq) = J{fJ.). 
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3 The Average Occupancy Eviction Problem 



The classical form of the eviction problem, as reviewed in the Introduction, is based on the 
assumption of a fixed capacity buffer: when the buffer is full, an eviction is required every 
time a miss occurs, otherwise no eviction is performed. In this section, we consider a relaxed 
version of the eviction problem where the buffer is of potentially unlimited capacity {C = V 
is actually sufficient) and a variable portion of it can be occupied at different times. The 
requirement that the item being referred must be kept in the buffer or brought in if not already 
there is retained. However, after an access (whether a hit or a miss), any item in the buffer, 
except for the one just accessed, can be evicted. In this relaxed buffer management scenario, 
in addition to the miss rate, an interesting cost metric is the average occupancy of the buffer. 
We call average occupancy eviction problem the minimization of the miss rate, given a target 
value for the average occupancy. This problem has direct applications, as mentioned in the 
Introduction. However, the study of average occupancy also sheds light on the fixed capacity 
version, as we will see in particular in the next section for the LRUSM. A key advantage of 
the average capacity problem is that its solution can be obtained by combining policies for the 
individual items considered in isolation. 



3.1 The Average Occupancy of Single Items 

When focusing on a single item, say cj, a first simplification arises from the fact that the state 
of the buffer, denoted f3u], is just a binary variable, set to 1 when the item is kept in the buffer 
and to otherwise. One also can restrict attention to the uj-trace a^j^t, defined as 1 when the 
item is referenced (at = uj) and to otherwise. Clearly, if the full trace at is a hidden Markov 
process, so is also the w-trace. However, in some cases, the cj-trace can be described with 
fewer states, forming a set Z^^. For example, the reduction is dramatic for the LRUSM, where 
\Z\ = V\ while \Zi^\ = V, for every item u. In general, we can write 

auj,t = rujizuj,t), with z^^t+i = 4>Lu{zuj,t,Wuj,t) ■ (16) 

Process z^j^t is called the Characteristic Generator (CG) of item u. The control input 
Ulu G {0, 1} determines whether uj is evicted from the buffer {u^j = 1), which is admissible 
only when the item is in the buffer (/3(^ = 1) and it is not currently accessed (oi^ = 0). This 
amounts to specifying the function ip^^ such that 

I3ui,t+1 = 1puj{l3uj,t,U^,t) ■ (17) 

The overall state of the control system is then x^^ = (z^, /3^), evolving as 

Xuj,t+1 = fuj{Xuj,t,Uuj,t,W^,t) ■ (18) 

We are interested into two types of cost: buffer occupancy and misses. Correspondingly, we 
introduce two instantaneous cost functions: 

f N « / \ Jl' f^^,t = A a^^t = I (miss) 

10, otherwise . 

We study eviction policies (of the form u^j^t = IJ'U){xu,t^ ^oj,*)) from a multiobjective perspective 
[20| 138] . Obviously, there is a tradeoff between the two costs, occupancy being minimized by 
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Figure 2: A Randomized Mixture of Policies (RMoP). Achieving C as average buffer occupancy 
using a probabilistic combination of //2 and jj,^. 

evicting the item as soon as possible and misses being minimized by never evicting it. The set 
of "good" solutions to multiobjective optimization problems is the Pareto set, i.e., the set of 
all policies whose costs are not dominated by the costs of other policies (Pareto points are also 
known as Efficient Points, EPs). 

A peculiarity of multiobjective optimization problems is that, since we are studying 
tradeoffs among the costs, sometimes we can usefully introduce Randomized Mixtures of 
Policies (RMoPs) to obtain more points in the costs space. E.g., consider the problem of 
choosing a route every day from home to work between two possible routes /xi and //2) with 
costs J(/u) defined by the driving time and gas used. If J(/ii) = (20 minutes, 10$) and 
•^(a*2) = (30 minutes, 5$) by choosing every day with the same probability /xi or ^2 we have 
long-run average costs equal to (25 minutes, 7.50$), which cannot be obtained by choosing 
always the same route. To properly define an RMoP for the average occupancy problem for a 
single item, we observe that there always exists a "hit" state x* = (z*, 1) (with r^{z*) = 1) 
which is recurrent in the system evolution (otherwise the item frequency would be zero and 
there would not be need to buffer it). If we call ti,t2; • • ■ the times at which the system 
enters the recurrent state, then the problems of choosing an eviction policy in the intervals 
\ti + l,tj+i] are independent; by randomly choosing for each interval between two policies we 
can obtain all the convex combinations of the costs of the original non-mixture policies (see 
Fig. [2]). More in detail, if exist policies /i' and /i" with Joc(Ai') = and Joc(a*") = V such 
that C = 77/' -|- (1 — 7)77" for some 7 S [0, 1], we write = rand-y (/i', n") to mean that n is an 
RMoP that chooses fi' with probability 7 and /i" with probability 1 — 7 (the random choice 
being made every time the system leaves the state x*). Note that an analog equation holds 
also for the miss rate cost: Jms(/^) = 7<^ms(/^') + (1 ~ 7)«^ms(^")- 

Since any point in the costs space between two policies can be obtained by an appropriate 




Jif^s) = (»?B,C5) 
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RMoP, the set of "good" policies in this context maps into the set S of Supported EPs 
(SEPs, also known as Pareto-convex points) which have costs not dominated by any convex 
combination of the costs of other pohcies. The set of SEPs can in general be of size exponential 
in \Zui\ and thus it is interesting to find a polynomial approximation S of S. More specifically, 
for every point ( Ji, J2) in S we want two points ( J^, J2) ™d ("^I'l "^2 ) ^^'^ ^ ^^^^ number 
< 7 < 1 such that, for g = 1, 2, we have Jg{l + e) > -fJ'^ + (1 - 7) ■ 

A necessary and sufficient condition j42| [T8] to get a Polynomial Time Approximation 
Scheme (PTAS) to construct S is the availability of a PTAS for solving the single-objective 
scalarized problem (Scal) which has the following cost: 

Gujfiixuj, Wuj) = cos{6)goc,uj{xu!,Wu!) + sin(6')5ms,<^(X(^, Wui) ■ (20) 

The PTAS to build S runs in time polynomial in both \Zi^\ and (1/e). (Note that any relative 
weight between the two costs can be achieved by a suitable choice of 9 £ [0, 7r/2].) While SCAL 
is interesting in its own right, here we are considering it mainly as a tool to find a polynomial 
approximation of the set of SEPs. For the single item problem the optimal policy solution 
of SCAL can be obtained by solving the following Bellman equation [11], in the unknowns 
X^^g G M and h^^^g : ^R, where = Z^, x {0, 1}: 



(21) 



The solution can be computed in time polynomial in |Z^| (the problem is P-complete [41j). 
The optimal policy can be obtained as tioj,eixu),Wi^) = arg min^j^ {hi^,e ifuj{xui, u^j, Wcj))}- From 
the general theory, the cost of the optimal policy fii^^g (for the instantaneous cost G^jm) is 
Ju},0 = ^ui,e- Standard methods permit the polynomial time computation of the values Joc,aj 
and Juis,id of the same policy for the instantaneous costs goc,uj and guis,uj, respectively. 



3.2 The Average Occupancy of All Items 

The solution for the average occupancy problem of all items can be obtained by choosing, 
for each item lo, an appropriate solution (given in general by an RMoP) to the single item 
problem. In fact, let ^ be an optimal policy for the all items problem, then induces an 
occupancy cost Joc,Luif^) and a miss rate cost Jms,w(/f^) for each item u. Now consider the single 
item problem of finding a policy fi^^ which minimizes Jms,aj(A*cj) while achieving occupancy 
Joc,ui{fJ-ui) = Joc,ui{fJ-)- then we must necessarily have Jms,aj(/^aj) = Jins,u)ifJ-), otherwise either fi 
or fii^ would be suboptimal. Because of this relationship between single and all items optimal 
policies, the average occupancy problem for all items reduces to a convex resource allocation 
problem, which can be solved efficiently j50l |52| l2l] . once we have the approximation of the 
set of SEPs for each single item. 

A quantity of pivotal importance is the marginal gain p^^ of policy /i^ w.r.t. to policy fi^^ 
defined, as 

A Jms,ui{Puj) — Jms,u){P'ui) /„o\ 

Pi^ = J ( / \ _ T ? ^" • ^ ' 

'Joc,uj\P'u} ) 'Joc,w\Puj ) 

The solution can be obtained in a greedy fashion by increasingly allocating buffer capacity to 
the item which provides a larger marginal gain. This greedy procedure takes time polynomial 
in \ ZJ\\ a more sophisticated, asymptotically optimal algorithm can be found in [24j. 
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Note that we can obtain the global miss rate M and buffer occupancy C just adding up the 
equivalent quantities for the single items: 

C = Y,Joc,u . (23) 

Theorem 1. Let J{nl) = (rjlXl), J{l^l) = {vl,0, ■ ■ ■ be the list, with r]l < r/^+\ of the 
SEPs for the generic item lo. The optimal policies /x^ (given for each item) are obtained 
applying Algorithm^ 



Algorithm 1: Obtaining the optimal algorithm for multiple items. 

1 Vw c[uj] 1 ; 

2 B ^1; 

3 while B < C do 
II choose the policy improvement with maximum marginal gain 




8 7^ such that B - rfj"^ = 777^''^^ ^ + (1 - 7) 



9 fi^^ rand^ UJ,'"^' \ /i^'"^' 



Remark 1. Algorithm [T] will produce a non- mixture eviction policy for each item except for 
one, for which an RMoP may be needed in order to use the entire buffer space assigned to it. 
Note that, given a target value C for the buffer occupancy, there exist a global policy made 
only of single item non-mixture policies which has average buffer occupancy C — 1 < C < C 
and is optimal for that value of the occupancy. 

3.3 Buffer Partitioning 

An interesting application of the previous algorithm arises when we want to partition a buffer 
of capacity C among n independent processes, each described by a different HMRM. We 
assume that the i-th process accesses a private address space size Vi, at each step the i-th 
process has probability VTj (X^ILi = 1) to be the one to run. We want to determine the 
capacity d to devote to process i (with Y17=i ~ ^) minimize the global miss rate 

over an infinite temporal horizon, under the hypothesis that each process is using the optimal 
eviction policy. 

It is straightforward to see that this problem is equivalent to building the global policy 
from single items with the cost J^^s of the items accessed by the i-th process rescaled by a 
factor TTj. 
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4 Optimal Policies for the LRUSM 



In this section, we focus on stack optimal policies for the LRU Stack Model. After first 



reviewing the LRUSM (^4.1 ), we derive the optimal policy in the average occupancy framework 



(^.2); due to the strong LRUSM structure many simplifications apply to the general procedure 



built in the previous section leading to a very efficient way of computing the optimal policy 



(linear in V). In ^4.3 we focus on the fixed occupancy eviction problem and, by linking its 
solution to the average occupancy one, we are able to provide a stack eviction policy (Least 
Profit Rate, LPR) which is optimal for the LRUSM. Furthermore, we can fully characterize 
LPR behavior using a priority function based on a notion of profit which might be also of 
interest for other memory reference models. 

4.1 The LRU Stack Model 

Mattson et al. |36] observed that a number of eviction policies of interest, including LRU, 
MRU, and OPT, satisfy the following property. 

Definition 1. Given an eviction policy fi defined for all buffer capacities, let B^[C) be the 
content of the buffer of capacity C at time i, after processing references oi, . . . , Of. We say that 
the inclusion property holds at time t if, for any C > 1, B^{C — 1) ^ B^(C), with equality 
holding whenever the bigger buffer is not full {\B^{C)\ < C). We say that /x is a stack policy 
if it satisfies the inclusion property at all times for all address traces, assuming that inclusion 
holds for the initial buffers Bq{C), with 1 < C < V (this is trivially verified if we assume, as 
we do, the initial buffers to be empty). 

The optimal on-line policy LPR, developed in this section, is a stack policy. Inclusion 
protects from Belady's anomaly ^Sj (i.e., increasing the buffer capacity cannot lead to a worse 
miss ratio, as can happen with, e.g., FIFO) and enables a compact representation of the 
content of the buffers of all capacities by the stack of the policy, an array whose first C 
components yield the buffer of capacity C as 

sr(C) = [Ar(l),...,Af(C)] . (24) 

The stack depth dt of an access at+i is defined as its position in the policy stack at time t, so 
that 

at^i = k^t{dt) . (25) 

Upon an access of depth d, a buffer incurs a miss if and only if C < d. Thus, computing the 
stack depth is an efficient way to simultaneously track the performance of all buffer capacities. 

In the LRU stack, which we shall denote just by A (i.e., A = A^^^), the items are ordered 
according to the time of their most recent access; in particular, A(+i(l) = aj+i. Upon an 
access at depth dt, the LRU stack is updated by a downward, unit cyclic shift of its prefix of 
length dt, as illustrated in Fig. [3] The LRU stack has inspired an attractive stochastic model 
for the address trace |39| H9| l5^ [T9| [30] , where the access depths di,d2, ■ ■ ■ are independent 
and identically distributed random variables, specified by the distribution 

s{j) ^ PrK+i = Kt{j)] = VT[dt =j], j = l,...,V , (26) 
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Figure 3: LRU stack update, 
or, equivalently, by the cumulative sum 

>5(j) = E^«' i = l'---'^ • (27) 

i=l 

We assume that an "initial' LRU stack is given. W.l.o.g., s(y) / {s{V) = imphes that the 
last page in the stack is never accessed and can therefore be ignored). 

For example, the case where s{j) decreases with j captures a strict form of temporal 
locality, where the probability of accessing an item strictly decreases with the time elapsed 
from its most recent reference. It is simple to see that the actual trace 01,02,- • • can be 
uniquely recovered from the stack-depth sequence di,d2, ■ ■ ■, given the initial stack Aq. 

To summarize, in the notation of Section [2j the state underlying the trace is the LRU 
stack (zt = At) and the disturbance is the stack distance {wt = dt). We have a HMRM, 
since o^+i = r(Af+i) = Af_(_i(l). The transition function (f) for the trace state (such that 
A(+i = (j)(At,dt)) corresponds to the right unit cyclic shift of the the prefix of length dt of 
stack A(. The control input ut, the buffer state bt, and the action of the former on the latter 
{bt+i = (j){bt,ut)) are according to the general definitions given in Section |2] 

4.1.1 K-L Eviction Policies 

A first optimal policy for the LRUSM was given by Wood, Fernandez and Lang in |56| 15 7| . 
They introduced the K-L eviction policy, defined as follows, in terms of LRU stack distance, 
for given integers K and L that \ < K < C < L <V . If access oj+i results in a miss, then 

• evict At+i{L -|- 1), if it is in buffer Bt{C); 

• otherwise evict At^i{K +1), which is always in Bt{C) if the policy is consistently applied 
starting from an empty buffer. 

Here, A denotes the LRU stack, while B{C) denotes the content of the buffer under the K-L 
policy. Eviction is specified in terms of the LRU stack immediately after the rotation due to 
access at+i- Special cases are LRU [K = C, VL) and MRU {K = 1, L = V). It has been 
shown in j56| l57j that, under the LRUSM, for any C, there exist values K{C) and L(C) for 
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which the K-L pohcy is (gain) optimal. It is easily shown that, in the steady state, the items in 
the top K positions of the LRU stack are always in the buffer, the items in the bottom V — L 
positions are always outside the buffer, and C — K oi the L — K items between position K +1 
and L are in the buffer. (Technically, the K-L policy is not specified for a buffer containing 
neither At+i(L + 1) nor At+i(i^ + 1). However, if the least recently used item is evicted in 
such case, one can show that the same steady state is reached, with probability 1.) It turns 
out that the miss rate of the K-L policy for a given distribution s is 

M-^W = 1 - SmL-C)^+J\L)(C-K) 

Finding, for each C, the optimal parameters K{C) and L{C) can be then accomplished in 
time Qiy^), by evaluating the miss ratio for all possible {K,L) pairs. In general, for a given 
C, the miss rate can be minimized by different {K,L) pairs. Smaragdakis et al. |47| 148) proved 
that, for any given s, K(C) and L(C) can be chosen so that the resulting K-L policy does 
satisfy the inclusion property. 

In the next subsection, we derive the optimal eviction policy in the average occupancy 
framework developed in Section [3] and highlight its deep structure. In §4.3| we derive a stack 
policy which is optimal in the fixed buffer model. By exploiting its similarity with the average 
occupancy optimal, we develop an algorithm that computes the two K and L parameters for 
all C G {1, V} in linear time, whereas a straightforward computation that does not exploit 
the stack policy characterization takes cubic time (a first speedup from cubic to quadratic is 
obtained exploiting the inclusion property, while the improvement from quadratic to linear 
derives from algorithmic refinements). 



4.2 Average Occupancy Problem 

The general procedure described in the previous section can be specialized for the LRUSM as 
follows. For each item co the associated CG is a Markov chain z^j^t with states Z^^ = {!,..., V}. 
The state encodes the position of the item in the LRU stack, thus satisfying the following 
transition probabilities: 

Vi Fr[z^^t+i = Ikoj.t = i] = s{i) (29) 
Vi / 1 Fr[z^^t+i = i\zuj,t = i] = S{i - I) (30) 
Vi/y Fr[z^^t^i = i + l\z^,t = i] = l- S{i) . (31) 

We also have ri^{l) = 1 and Vi 7^ 1 r^^^i) = 0. Thus, in the LRUSM the statistical description 
of the CGs is the same for all the items. The stationary eviction policies for the single item 
are binary vectors of size V which say, for each LRU stack depth, if an item arriving at that 
position should be evicted. The "hit" state x* = {z* = 1, /3 = 1), in which the item is at the 
top of the LRU stack and in the buffer, is recurrent and hence will be used to define RMoPs 
in the LRUSM: we assume an RMoP will choose a SEP policy each time the system leaves the 
state X* . In the following we call lifetime the time between two consecutive x*. A lifetime 
ends when the item is accessed. Thus, at the beginning of a new lifetime, an item is always at 
the top of the stack and in the buffer. If evicted from the buffer, the item will re-enter only at 
the beginning of the next lifetime. 
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Wc define EVfc as the policy which keeps the item in the buff"er while at stack depth i < k 
and evicts it if i > k. Note that, in steady state, any policy fi is equivalent to an appropriate 
EVfc, where /c + 1 is the smallest depth at which the policy /x evicts. Hence, w.l.o.g., we can 
hmit our study to EVjt pohcies. To provide a close form for both the occupancy and the miss 
cost of EVfe, we first need to establish some properties of the Markov chain z^^. 

Proposition 2. Let t-'-_^^ he the expected time spent by an item in position j + h (h > 0) 
conditioned to the fact the item surely arrives in position j without getting accessed. Then 

*'j+h= l-Sij-1) ■ ^^^^ 
(Note that the expected time does not depend on h.) 
Proof. For /i = we easily have that 

t^ = i + su-i)^ => ^o = i_s\j-i) ■ (33) 

The event that an item starting from position j does not arrive in position j + 1 is the disjoint 
union of the events that there are exactly h accesses with depth smaller than j followed by 
one at depth j. So the probability for an item in position j to arrive to position _7 + 1 is 



+ 00 



^+1 = 1- ^0-) E - It = ThrrT) ' (3') 



and hence the expected time spent in position j + 1 is 



- - i_s{j-i)i- s{j) ~ ^ • ^ ' 



Furthermore 



4-3 _ pi pi+l pj+h-l,j+h _ ,j /o<?\ 

□ 

As a special case, if we know that an item starts from position 1 (in which is going to 

spend exactly one time step in the current lifetime) then is going to spend an expected time 
tj = 1 in each position on the LRU stack in its lifetime and hence a lifetime has an expected 
length of V timesteps. Under policy EVfc, the item is buffered on average for k timesteps per 
lifetime and hence the occupancy cost is given by 

Joc(EVfc) = :^ . (37) 

As for the miss rate, since the each position at depth j > k contributes with a probability s{j) 
to the misses, we have that 

Jms{EVk)=^—p^ . (38) 
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A direct application of Algorithm [T] would provide optimal policies identical for all the 
items except for on one (which, in general, will be managed by a RMoP). Actually, we can take 
a slightly different approach to exploit the CGs symmetry: by forcing the eviction policy to be 
the same for exactly all the items the global problem is reduced to the single-item one, with 
just a rescale of the costs Jqc and Jms by a factor of V. The optimal policy is then an RMoP 
(the same for all the items) which mixes two eviction points corresponding to SEPs, causing 
each item to be evicted only at two possible stack depths. The next theorem summarizes the 
preceding discussion. 

Theorem 2. Let qi = 1 < q2 < ■ ■ ■ < qi = V be the values of k such that EV^ is a SEP 
(i.e., a Pareto-convex) policy and let qi < C < Qi+i, so that C = j'qi + (1 — 'y')qi+i for 
some 7' G (0, 1]. Then, the RMoP that, for each item, mixes policies EV^. and EV^.^^ with 
probability 7' and (1 — 7'), respectively, achieves optimal miss rate for average occupancy C . 

It is a simple exercise to show that, if s is monotonically decreasing, then / = V and q^ = i. 
Instead, if s is monotonically increasing, then I = 2, qi = 1, and g2 = V- 

4.3 Fixed Occupancy Problem 

We have just shown that the optimal eviction policy in the average occupancy model is 
characterized by two eviction points in the LRU stack (let us call them K'{C) and L'{C)). It 
is natural to investigate what happens by applying a K-L policy in the fixed occupancy model 
with K = K'{C) and L = L'{C). Under the K-L policy each lifetime is managed either by 
policy EVi<- or EV^ and thus its occupancy C and miss rate M can be written as 

C = 7Joc(EV/<) + (1 - 7) Joc(EVl) M = 7J^,(EV,,) + (1-7)J„,,(EVl) . (39) 

A set of similar equations holds for the average occupancy setting: 

C" = yJoc(EVK') + (l-7')^c(EVL0 M' = 7' J^s(EV^O + (1 - 7')^ms(EVL0 . (40) 

Since C = C, K = K' and L = L' we must have the 7' = 7. Then, we also have M = M', i.e., 
the miss rate achieved by the K-L policy for K = K'{C) and L = L'{C) is the same achieved 
by the average occupancy optimal. Since the optimal miss rate for average occupancy C is 
obviously a lower bound for miss rate under fixed occupancy C, the choice K = K'{C) and 
L = L'{C) parameters yields an optimal policy under fixed capacity. 

Based on these observations and on Theorem [2| we can compute, for each positive integer 
C <V ^ values K{C) and L{C) for a gain optimal, fixed capacity policy. 

Corollary 1. Given a stack- distance distribution s, the corresponding values qi, q2T ■ ■ , Qi can 
be computed in time Q{V) by Algorithm^ a specialization flO\, \34]l of the Graham scan 125^ 
to obtain the convex hall of planar sets of points. The same algorithm also computes optimal 
K{C) and L{C) for all (integer) buffer capacities 1 < C < V , uniquely determined from the 
qi 's by the relation 

K{C)=qi<C<qi+i=L{C) . (41) 
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Algorithm 2: Computing K{C) and L{C) and Profit Rates priorities - Linear algorithm. 
K{C) = Qi < C < Qi+i = L{C) for some i. 

// Graham scan specialization 

7r[l]^s(l); A[l] ^ 1; 
7r[F + l]^0; A[y + l]^l; 
for J ^ y downto 2 do 

A[j] ^1; n ^ + 1; 
while 7r[j]/A[j] < 7r[n]/A[n] do 

Aj] ^ 7r[j] +7r[n]; 
A[j]^A[i] + A[n]; 
L n ^ + 1; 

// Print segmentation and priorities 
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1; z 1; 
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while J < do 




13 




print "gj = " 
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i ^ i + 1; 
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J 
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while j < F do 
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print "^(j) = 
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4.3.1 The Least Profit Rate Eviction Policy 

In this subsection we define a new stack policy, called Least Profit Rate (LPR), by means of 
suitable priorities. We will see that, for any C, in steady state, the LPR policy becomes the 
same as the K{C)-L{C) policy, and is therefore optimal. 

Definition 2. Let lo be an item identified by its stack depth i (i.e., A(i) = u). We define its 
profit rate ^ as 

yi^l C(z)^maxs(z,j) . (42) 

(The profit rate of the last accessed item A(l) is not defined, since it cannot be evicted.) 

Definition 3. Let s(i,j) be the average of s between i and j: s{i,j) = j-i+i ' '^^^ LPR 

policy evicts the item i* in the buffer such that 

= argmin^(z) = argminmaxs(i, j) . (43) 

i i j 

(If more items achieve the minimum, then the closest to the top of the stack is chosen.) 

Next, we can state the main results of this section: 

Theorem 3. The K{C)-L{C) eviction policies (for the various C) are uniquely determined 
by the LPR (whose definition does not depend on C). 
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Lemma 1. Let s be a function from natural to positive real numbers: s : N — )• M"*" and let s be 
its moving average. Let A be the point of maximum for s. Then for each q < X we have 

s{X)<s{q,X) . (44) 

Proof. We can rewrite s(A) as 

i=i i=i ^ 

=^«-(.) + ^^|^%,A) . (47) 

Being s(A) a convex combination of s{q) and s((7, A) the following inequality holds: 

mm {s (q), s{q, X)} < s(A) < max {s {q), s{q, X)} . (48) 
Since for hypothesis we have s(A) > s{q) we must have 

s{q)<s{X)<siq,X) . (49) 

□ 

Lemma 2. Lei s be a function from natural to positive real numbers: s : N — t- M^, /ei s be its 
moving average and s{q,j) be the average of s between q and j. Let X be the point of maximum 
for s. Then for each r > X we have 

s(A + l,r) < s(A) . (50) 
Proof. We have the following convex combination: 

s(r) = -s(A) + ^^s(A + l,r) , (51) 
r r 

with s(A) > s(r) so we must have 

s(A + l,r) < s(r) < s(A) . (52) 

□ 

Remark 2. Lemmas [T] and [2] highlight the following useful properties of the sequence 
92, • • • 1 9/) obtained by Algorithm [2] 

1. The average value of s is strictly decreasing between segments: s(l + qi, qi+i) > s{l + 

2. Within each segment the average of any prefix is smaller or equal to that of any suffix: 

VA; G [l + qi,qi+i], then s{l + qi,k) < s{l + qi,qi+i) < s{k,qi+i) 
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Proof of Thm. [3l Starting with an empty buffer we evict the first time when the top C 
position of the LRU stack are filled. The position i* = argminj maxj s{i,j) is exactly K + 1 
(as can be seen applying Lemmas [l] and [2]) . The following position I that has maxj s{l,j) < 
maxj s{i*,j) is L + 1, so each time an item reaches that position is evicted (it can reach it 
only after a miss, since the positions after L are not in the buffer). □ 

Below, we give a general formulation of the concepts of profit and of profit rate in the 
HMRM. For the definition we adopt the single item average occupancy model. Given an item 
u and a time t, consider a single item eviction policy /i to determine, for any underlying state 
z of the CG, whether cj is kept in the buffer or evicted upon reaching that state. We call 
/i-profit of uj the probability vr^ that uj is referenced before it is evicted, which is a measure of 
how useful it would be to keep u in the buffer under the policy. Let then t + A, with A > 0, 
be the earliest time after t such that uj is either referenced or evicted at time t + A. Clearly, 
E^[A] is a measure of the storage investment made on uj to reap that profit. Therefore, the 
quantity 7r^/E^[A] is a measure of profit per unit time, under policy /x. Finally, we call profit 
rate of u the maximum profit rate achievable for uj, as a function of /x. (It ought to be clear 
that profits and profit rates depend upon the current underlying state z of the CG, although 
this dependence has not been refiected in the notation, for simplicity.) 

As for the LRUSM, let consider a buffered item at position i in the LRU stack. If we 
choose to keep it in the buffer until it goes past position j, then the item is going to spend on 
average the same time in each position between i and j and hence its profit per unit time will 
be s{i,j). This function has a maximum for some value of j that defines the item profit rate, 
as shown above. 

The Least Profit Rate (LPR) policy evicts, upon a miss, a page in the buffer with minimum 
profit rate. Profit rates are independent of buffer capacity, hence can be viewed as priorities. 
If ties are resolved consistently for all buffer capacities, LPR satisfies the inclusion property. 
In general, LPR is a reasonable heuristic, but not necessarily an optimal policy in the fixed 
occupancy model. 



5 On-line vs. Off-line Optimality 

Intuitively, the optimal off-line policy makes the best possible use of the complete knowledge 
of the future address trace, whereas the optimal on-line has only a statistical knowledge of 
the future. We compare these two information conditions via the stochastic competitive ratio, 
defined as the following functional of the distribution s: 

where M^P^[s] and M*-*P"'"[s] denote the expected miss rates (technically, the limit of the 
expected number of misses per step over an interval of diverging duration, as considered in 
gain optimality). The next theorem states the key result of this section. 

Theorem 4. For any stack access distribution s and buffer capacity C, x[s] = O(lnC). If 
ML™[s] > ^, then the bound can be tightened as x[s] G O ( In j^^i^'ha 
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To gain some perspective on the O(lnC) bound for LPR, we observe that the stochastic 
competitive ratio of LRU can be as high as C, (take s : s{C + 1) = 1). We also remark that, 
for classes of distributions where the miss rate of LPR is bounded from below by a constant, 
the competitive ratio is bounded from above by a corresponding constant. 

Theorem |4]is established through several intermediate results: 

1. A lower bound L*-'P'^[s] is developed for the (difficult to evaluate) quantity M^P'^[s], 
which yields a manageable upper bound to x[^] (Prop. [3|. (An analog lower bound for 
the deterministic case can be found in 1401.) 



2. An upper bound to the competitive ratio is evaluated for a quasi uniform distribution, 
which is analytically tractable (Prop. |4]). 

3. Finally, the analysis of the stochastic competitive ratio of an arbitrary distribution s is 
reduced to that of a related, quasi uniform distribution s' (Prop. [s]). 

Steps 1 and 3 lead to the following chain of inequalities: 

^^^J MOPT[s] - LOPT[s] - LOPT[s/] ■ ^^^) 

Proposition 3. Under the LRUSM, the miss rate of OPT is bounded from below as M^^^[s] > 
LOPT[s]^ where, for G e {I, . . . ,V - C}, 

LOP^lsjAcl'^y' , lOPT[s]A max LgPT[s] . (55) 

Proof. For a given value of G, we consider a partition of the trace oi, 02, . . . into consecutive 
segments, each minimal under the constraint that it contains exactly C + G distinct references. 
Let Tj, a random variable, denote the number of steps in the i-th such segment. The random 
variables ti,T2,. ■ ■ are statistically independent and identically distributed. Any one of them, 
generically denoted r, can be decomposed as 

j=0 

where (j)j is the minimum number of steps, starting at some fixed time, to observe the first 
access with stack distance greater than j. It can be easily seen that (pj is a geometric random 
variable, with parameter pj = 1 — S{j), Pr[i;^j = k] = {1 — Pj)''~^Pj, expected value 1/pj, and 
finite variance. By linearity of expectation we have: 



Under any policy, including OPT, in each of the intervals of durations ri, T2, . . ., there occur 
at least G misses, since at most C of the referenced items could be initially in the buffer. 
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Therefore, we can write the following chain of relations: 



M^^'^[s\ > lim E 

(jf— >oo 



qG 



> G lim E 



1 



GE 



1 



lim, 



g— >oo Tq 



G 

EM 



(58) 



where fq = "^f^i ^- The interchange between limit and expectation is justified because 
(i) by the law of large numbers, fg converges in distribution to the delta peaked at E[t] 
and (ii) the function 1/x is continuous and bounded within the support of fq (which equals 
{xeZ:x>C + G} since, for each i, n > C + G) [35j. □ 

Proposition 4. Let s' be the distribution defined as 



a j = 1 

s'U) = U je{2,...,v-i} . 

.V' 3 = V 

Then the following lower bound for OPT miss rate holds: 



M 



OPTr / 



s'] > L^^^[s'] > 



V-C 



1 + - In 



1 / {V-2)7] + r]' 



■V-2-C\ 



rj + rj' 



Proof Applying Prop. [3] we can write 



G 



G 



C+G-l 



V-2 

1+ E 



S{j) 

1 



-1 



C+G-l 



k=V-C-G 



rj' + kr] 



G 



> G 



1+ E ^ 



- 7]' + {V - j - l)ri 



-1 



1 + - In 
V 



{V-2)r) + r]' 



{V -G -G-l)r] + r]' 



By setting G 



V-C 



the thesis follows. 
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(61) 



(62) 
□ 



Lemma 3. //Vj Si{j) > S2{j) then L^^^[si] < L^^'^N- 

Proof. From the definition of L^^^ we can see that it is decreasing with Yl'g=o'~^ ^-S{g) ' 
hence decreasing with each S{j). 



and 

□ 



Proposition 5. Let s be an LRU stack access distribution, G the buffer capacity and LPR its 
associated optimal online policy. Consider s' defined as follows: 



a j = 1 

7] j e{2,...,L + D} 

rl j = L + D + 1 

j> L + D + 1 



(63) 



whereri = s{K + l,L), a = S{K)-{K-1)7], D = -I, r]' = I- S{L) - Dr] (note that 

<r]' < r]), then s' is a valid distribution and < L°P'^[s] and M^™[s'] = M^^^[s]. 
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Proof. We begin by observing (due to Lemmas [T] and |2]) s{2, K) > s{K + 1, L) > s{L + 1,1/). 
The transformation from s to s' can be intuitively described as follows. 

• We first "flatten" to = s{K + 1, L) the access distribution within the segment + 1, L] 
(in this operation no probability mass is moved outside the segment). 

• We set s{j) = ?7 in the interval [2,i^]. Since s{2,K) > s{K + 1,L) some probability 
mass is removed from [2,1^]. 

• We add to s(l) the probability removed during the previous step. 

• We redistribute the probability mass 1 — S{L) at positions starting from L + 1, assigning 
r] per position, possibly followed by a leftover 7]' . This is possible because s{K + 1, L) > 
s{L + l,V). 

After all these movements total probability mass is preserved and no position has a negative 
value, hence s' is still a valid distribution. Because of Lemmas [l] and [2] the transformation 
yields Vj 5"(j) > S{j) and hence, because of Lemma II L'^^'^[s'] < ^^^'^[s]. As for LPR miss 
rate we have 



M^^%] = 1 - S{L) + (L - C)i] = {L + D-C)ri + r]' 



M 



LPRr / 



(64) 
□ 



Proof of Thm. |4l Because of Prop. |5| given a distribution s we can obtain a quasi uniform 
distribution s' (with a narrowed memory space W = L + D + 1 <V) such that 



X[s] < 



M^P^[s'] r]' + {W - I - C)7] 



W-C 



2 , 
2 + - In 
V 



{W-2)7] + r]' 



( 



W-2-C^ 



X[s'] . (65) 



-2 — )V + V\ 

To simplify the analysis of x[s] we can assume C ^ 1 and W — C > 2 (the complementary 
case is easy to deal with) . Equation ( 65 ) reduces to 

■ 2{W -I) " 



XH <0(l) + 21n 



W -2-C 
1 



G O(lnC) 



(66) 



Finally, since M^^^[s] < {W — C)r] and r] < (since s(l) + r]' + r]{W — 2) = 1) we have 
MLPR[s] < ^^Ef , implying 

X[s] G O ( In ^ 



which, for M^^^[s] > i, provides a more descriptive bound than m6^. 



(67) 

□ 



6 Fast Simulation of the LPR Policy 

In experimental studies it is important to simulate eviction policies on sets of benchmark 
traces. From the stack distances, the number of misses for all buffer capacities can be easily 
derived in time 0{V). Previous work [9l U has shown how to compute the stack distance 
for the LRU policy in time 0{\ogV) per access. We derive an analogous result for the more 
complex LPR policy. 

Theorem 5. Given any stack- distance distribution s, the number of misses incurred by the 
corresponding optimal LPR policy on an arbitrary trace of N references, can be computed, 
simultaneously for all capacities, in time OiV + NlogV). 



23 



The algorithm proposed to prove the preceding result exploits some relations between the 
LPR stack and the LRU stack and achieves efficiency by means of fast data structures. 



Proposition 6. Let s be a stack- distance distribution and let {qi,q2, ■ ■ ■ ,Qi) be the (increasing) 
sequence associated with s as in Theorem^ Let A be the LRU stack and let H be the LPR 
stack corresponding to s, both assumed initially equal. Ln either stack, let the i-th segment 
be the set of positions in interval Qi = [qi + 1, Qj+i]. Finally, for j G {1, 2, . . . , V}, let pt{j) 
denote the position in the LPR stack Hf of the item that is in position j of the LRU stack At, 
i.e., Ut{pt{j)) =At{j) . Then, we have: 



1. Segment equivalence. At any time t, segment Qi contains the same items in both 
stacks, that is, j G Qi if and only if pt{j) G Qi- 

2. Relative update. Upon an access at at LRU stack depth dt, if dt = 1, then the map p 
between the two stacks is unchanged. Otherwise (dt > 1), p is updated as follows. For 
any t, any segment Qi, with i = \, . . . ,1 — 1, and any h G {Oj^j+i — qi — 1}, (so that 
{qi + l + h) G Qi), we have: 



Pt+i{qi + l + h) = < 



pt{qi + l + h) dt<qi + l + h 

Pt (gi + 1 + {{h - 1) mod {dt - qi))) qi + I + h < dt < qi+i 
^pt{qi + I + {{h - 1) mod {qi+i - qi))) qi+i < dt . 

(68) 



In other words, (a) below the LRU point of access, p does not change; (b) within the 
prefix of the segment including the access up to the point of access itself, as well as (c) 
within those segments that are entirely above the point of access, p incurs a unit, right 
cyclic shift. 

Proof. Segment equivalence. We begin by observing that, for buffer sizes in the set 
{qi,q2 7 • • • tlie LPR policy coincides with LRU, whence B^^^{qi) = B^^^{qi). In fact, 
for a K-L policy, the buffer content is a subset of the first L positions of the LRU stack. From 
Corollary [T] if C = qi, then L{C) = qi, hence the LPR buffer content must equal that of the 
first qi positions of the LRU stack which, by definition of stack, is also the content of the LRU 
buffer of capacity q^. The segment equivalence property follows since, for any stack policy, the 
content of the stack in a segment Qi equals (by definition) the set theoretic difference between 
B{qi+i) and B{qi). 

Relative update. If dt = 1, then neither stack changes, therefore /O^+i = pt- Otherwise, let 
dt S Qa, that is, let Qa be the segment capturing the access and consider the following cases. 

(i) For all buffer sizes C > qa+i, the access is a hit both for the LRU and for the LPR 
policy, hence both stacks, and consequently the p map, remain unchanged at positions greater 
than qa+i- This is a subcase of case (a) in the statement and applies to all segments with 
i > a (hence 1 + q^ + h > qa+i)- 

(ii) For buffer sizes C £ Qa the situation is as follows. Under LRU, there is a miss for C < dt, 
so that in the LRU stack: the items in positions smaller than dt shift down by one position, 
the item at dt goes at the top of the stack, and all other items retain their position. Under 
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LPR, let Ca be the smallest capacity of a buffer containing the referenced item, at+i. Then, all 
the buffers with C < Ca will evict item At{qa), which will go to position Ca of the LPR stack, 
the item previously at Ca will go to the top of the stack, while all remaining items in segment 
Qa will retain their position. Consequently, for h = {dt + 1) — [qa + 1), . . . , Qa+i — {(la + 1)) we 
have pt+i{qa + l + h) = pt{qa + 1 + /i), still a subcase of case (a) in the statement. Furthermore, 
Pt+iiqa + 1) = Ptidt) and, foi h = 1, . . . ,dt - (qa + 1), Pt+iiqa + I + h) = pt{qa + l + h-l), 
which establishes case (b) in the statement. 

(iii) For segments Qi with i < a, the argument is a straightforward adaptation of that 
developed for case (ii) and establishes case (c) of the statement. 

□ 

We are now ready to provide the algorithm for LPR stack-distance computation and its 
analysis. 

Proof of Thm. [5j The work of [9l H] has provided a procedure that, given the initial LRU 
stack Aq and the prefix ai, . . . ,at of the input trace, will output the LRU stack distances 
do, ... , dt-i, in time 0(y + 1 log V). Below, we develop a representation of the map p between 
the LRU and the LPR stack, which can be updated and queried in time 0(logV) per access. 
Then the LPR stack distance of access at+i can be obtained as p{dt). 

By Proposition [6j we can represent p by a separate sequence {p{qi + 1), . . . ,/9(gj+i)) for 
each segment. On such a sequence, we need to perform cyclic shifts of an arbitrary prefix 
and to access element p[qi + 1 + /i), given an /i E [0, gi+i — {qt + 1)]. Any of the well-known 
dynamic balanced trees (AVL, 2-3, red-black, . . . ) [16j can be easily adapted to perform each 
of the required operations in time 0(1 -|- log(gi+i — qi)) = 0{\ogV). Hereafter, we denote by 
Ri the data structure for segment Qi. 

The number of segments where the map p can change in one step can be OiV), in the worst 
case. Therefore, we adopt a lazy update strategy whereby only the segment Qa capturing the 
access is actually updated; for the other segments, record is taken that a shift should be applied 
to the sequence, without performing the shift itself. It is sufficient to increment a counter 
storing the amount of shift that has to be applied to the sequence and then perform just one 
global rotation when the segment is accessed. Still, individually incrementing each segment 
counter could lead to work proportional to V , per step. Instead, we maintain an auxiliary tree 
T of counters which collectively serve the segments and where at most logarithmically many 
counters need updating in a given step. A further field in the auxiliary tree will enable quick 
identification of segment Qa. 

More specifically, the auxiliary tree is a static, balanced tree with I — 1 leaves corresponding, 
from left to right, to segments Qi, . . . Qi-i. In each internal node, a search field contains the 
maximum right boundary of any segment associated with a descendant of that node. In each 
node, a counter field will be maintained so that, at the end of a step, the sum of the counters 
over the ancestors of the z-th leaf represent the amount of shift to be applied to /o-sequence in 
segment Qi. 

With the above data structures in place, the algorithm to process one access at+i is outlined 
next. 

1. From the LRU procedure, obtain the LRU stack distance dt. 

2. In the auxiliary tree T, with the help of the search field, traverse the path V from the 
root to the leaf corresponding to the segment Qa which contains dt. 
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3. While traversing V, increment by one the counter of a left child v oi a, visited node 
whenever v itself is not on V. (This operation corresponds to incrementing the shift 
count for all the segments to the left of Qa, as required by Proposition |6]) 

4. While traversing "P, add the counters of the visited nodes and apply a shift of the 
resulting amount to Ra- Subtract such amount from the counter of the leaf for Qa- 

5. Read pt{dt) from the {dt — f?a)-th position of sequence Ra and output this value as the 
LPR stack distance of at+i. 

6. Apply a unit right cyclic shift to the prefix of length [dt — Qa) of sequence Ra- 

Each step in the outlined procedure can be accomplished in time 0{logV), so that the overall 
time for processing N accesses is 0{V + NlogV), where the term V accounts for the initial 
set up of the data structures. 

To avoid that the counter in a node i/ of the auxiliary tree grow unbounded, we observe 
that it is sufficient to maintain its value modulo the minimum common multiple of the lengths 
of the segments associated with the descendant leaves of i'. □ 



7 On Finite Horizon 

In practice, when dealing with sufficiently long traces, a policy that is optimal over an infinite 
horizon is likely to achieve near optimal performance. For shorter traces, transient effects 
may play a significant role, whence the interest in optimal policies over a finite horizon. In 
principle, the optimal policy can be computed by a dynamic- programming algorithm based on 



(13), but the exponential number of states makes this approach of rather limited applicability. 
An alternate, often successful route consists in guessing a closed form characterization of a 
policy vr and its corresponding optimal cost function J^(-). Under very mild conditions, if 



the guess satisfies (13), then vr is an optimal policy. Unfortunately, we have been unable to 
find a tractable form for the optimal cost. Ultimately, we have circumvented this obstacle 
for monotone stack-depth distributions, by realizing that what is really needed to make an 
optimal choice between two states is not the absolute value of their costs, but rather their 
relative value. 

Theorem 6. Let s be non increasing, i.e., s{j) > s{j + 1) for j G {1, V — 1}. Then, for any 
finite horizon r > 1 and any initial buffer content, LRU is an optimal eviction policy. 

Theorem 7. Let s be non decreasing, i.e., s(j) < s{j + 1) for j G {^,V — 1}. Then, for any 
finite horizon r > 1 and any initial buffer content, MRU is an optimal eviction policy. 

Thus, for monotone stack-depth distributions, the finite horizon optimal policy is time 
invariant, hence it is also optimal over an infinite horizon. This property does not hold for 
arbitrary distributions. In spite of the symmetry between the above two theorems, their proofs, 
given in §7.1| and §7.2[ require significantly different ideas. 

We have extended Thm.|6]to the case of (non increasing) dependent stack depth distribution, 
where, given a prefix trace ^, the next stack distance is described by the following distribution, 
assumed to be non increasing for all C: 

S(;{i) = Fr[dt = i\C] . (69) 



The details are given in ^7.1.1 Thm. pi For the case of initially empty buffer a result similar 



to Thm. p\ but based on a stronger notion of optimality, is derived by Hiller and Vredeveld 
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|28] . using different tecliniques. Tliis generalization of the LRUSM has also been studied by 
Becchetti [6], who provides sufficient conditions on and C/V for the stochastic competitive 
ratio of LRU against OPT to be 0(1). 



7.1 Non-Increasing Access Distribution 

For the purposes of this section, the state description for the LRUSM developed in ^4.1 can 
be simplified, by unifying the representation of the LRU stack and of the buffer in a vector x 
of V binary components, where xt{j) = 1 when the item at depth j in the LRU stack is in the 
buffer at time t and Xt{j) = otherwise. The disturbance is still the LRU depth of the access 
{wt = dt), while the control ut specifies the LRU depth of the item to be evicted. Denoting by 
/ the state transition function, we have: 

xt+i = f{xt,dt,ut) . (70) 

With this representation, the well-known LRU policy amounts to evicting the item in the 
deepest position of the (resulting) stack, among those that are in the buffer: 

Definition 4. Let Rd{x) denote the state resulting by applying a unit right cyclic shift to 
the prefix of length d of x; (strictly speaking, if x{d) = then Rd{x) is a pseudo-state, as it is 
not in the admissible state set). The Least Recently Used (LRU) policy is defined (for a miss, 
x{d) = 0) by 

LRU(x,(i) = max{j : y{j) = 1, where y = Rd^x)} . (71) 

Definition 5. We say that two states y and z are form a critical pair and write y <c z \i 
their structure is related as follows, where v, c are arbitrary and a G 0*: 

y = luliOa , , , 

1 n 1 (^^^ 



We also write y <c z when y = z or y <c z. 

Remark 3. A critical pair represents a choice between what would the LRU policy do 
(obtaining y) and what would a different eviction policy do (obtaining z), when choosing the 
item to evict after the stack rotation. 

Lemma 4. The evolution of a critical pair under LRU preserves its criticality and order: 

yy^z:y<,z W y' = /^^^^^^ ^) <^ z'f'^^'^iz, d) . (73) 
where f^^^{x,d) = f{x,d,LR\J{x,d)). 

Proof. We analyze the four possible cases: 

• Hit for both y and z. The two stacks rotate and produce a critical pair with y' <c z' . 

• Miss for both y and z. The two evictions in the last filled positions make the states 
equal if t G 0*, otherwise they yield y' <c z' . 

• Hit for y and miss for z. The eviction in z yields y' = z'. 

• Miss for y and hit for z. The eviction in y brings y' = z' if l £ 0* and y' <c z' otherwise. 

□ 



27 



Proposition 7. Let s be monotonic non decreasing, then LRU is the optimal eviction policy 
for every time horizon r and any initial buffer content. Furthermore: 



Wyyyz -.yK.z J^iy) < J*{z) . 
Proof (by induction on t). Base case. For r = 1 we have 

Vx Ji (x) = Ed [g{x, d)] = g{x) , 
which, using the monotonicity of s, yields 

Vy Vz : y <cz Jl{y) < Jl{z) . 
Induction. Assuming now that the statement holds for all t < r, we obtain 

Vx J*{x) = g{x) + mmJ*^i{f{x,d,u)) 

L u 

= 5(x) + E4j;_i(/LRU(^,d))] . 

Since g{y) < g{z) and since, by the inductive hypothesis and Lemma [Ij 
we finally obtain J*{y) < Jr{z). 

Proof of Thm. [6l Thm. [6] follows directly from Prop. [7j 



(74) 
(75) 
(76) 

(77) 

(78) 
□ 
□ 



7.1.1 Generalization to Dependent Processes 

Let V = {1, 2, ... , V}, V° = {e} (e being the null trace), Vl = uf^gV* be the set of traces 
of size no greater than L, and X the state space (boolean vectors on the LRU stack). We 
consider stochastic processes generating a trace of length L, such that, after having generated 
a partial trace of length t — 1 = the probability distribution of the next access is specified 

by 

s^{i) = PT[at+i = At{i)\(] = Fr[dt = i\C] ■ (79) 
The optimal cost achievable for such a process, given a partial trace is\/x G X, 



J2(C, x) = Ed min {gix, d) + Jl {(d, /(x, d, u))} {( 



(80) 



where the probability distribution of d is a function of ( as given in (79). 



Theorem 8. //VC is a non-increasing function, then LRU is optimal for any initial buffer 
content: 



VLVCe VLVy,zEX:y<,z JUC , y) < JUC , z) . 



(81) 
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Proof. Let L be given. For a trace we define = L — The proof is by induction on r^. 
Base case. 

VC:r^ = lVxGX Jl{C,x) =Ed[g{x,d)\C] (82) 

^ yy<cz JUC,y)<JUC,z) . (83) 

Induction. Since by inductive liypotliesis we assume tliat \/6 : tq < t 

yy<cz Jl{e,y)<Jl{e,z) , (84) 

we obtain, Vx G X, 

J£(C,x) =Ed \mm{g{x,d) + Jl{CdJ{x,d,u))}\c\ (85) 

L M J 

= Erf [5(x, d)|C] + Ed [Jl {(d, /LRU(^, ^)) 1^] . (86) 
Let 6 = (^d, since Vy <c z J2(^, y) < >^2(^) z) and, by the inductive hypothesis and Lemma [Zj 

Jl (^,/^^"(?/,d)) < Jl (0,/L^U(z,ci)) , (87) 
we finally obtain J2((", y) < Jl{C, z). □ 

7.2 Non-Decreasing Access Distribution 

In this subsection we prove that MRU is the optimal eviction policy for non-decreasing s for 
any time horizon. The proof will be by induction: by assuming the optimal policy to be MRU 
for t < T we will be able to prove its optimality for the time horizon r + 1 (more precisely, a 
strengthened inductive hypothesis will be used). 

In order to compare costs under MRU for different initial states we introduce a useful 
partition of the misses. Imagine to place an observer on every out-of-buffer item, following 
the item going down the LRU stack during the system evolution; every time an out-of-buffer 
item is accessed its observer moves to the item evicted by the policy Thus the set ^ of the 
observers remains constant during the evolution. 

Let dt be the access depth at time t, let ^p be an observer and {ip, xq, df^t) its LRU 
stack depth at time t. We are interested in the event the item observed by ip is accessed at 
time t: dt = {ip, xq, dti^tj- We can partition the misses occurring in r steps attributing each 
miss to the observer on the item currently accessed: 

r— 1 r— 1 T— 1 

J!^{xo) = 5]PrMiss(t) = E E P^t*^* = lt{^,xo,dt'<t)] = E EP^'t*^* = (^,^o,dt'<t)] . 
t=o t=o j/je* ip^-i t=Q 



Let ipj be the observer which is at depth j at time 0. Under MRU the evolution of an 
observer ipj does not depend on xq but only on the initial position of the observed item (i.e., 
/^^^(^/^j, Xq, df'<t) = lY^^ {ipj^dti^t) = ki'ipj) for brevity). If we have two states x' and x" 
which differ for only two observers ipi and ^pj we can write their costs T' and F" as: 

r-1 r-1 

r'=Y, j;Pr[di = /i(V')] + ^Pr[<it = /t(V',)] , 
^evt\{^j t=o t=o 

r-1 r-1 ^^^> 

r"=E Y.^^[dt = itm + Y.^r[dt = it{i^,)] . 
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where the first term is equal in both the costs, because it is due to observers which start in 
the same position for both states, and thus: 

r-l T-1 

r' - r" = ^ PT[dt = kiA)] - Yl ^'i^^t = = 7r(i) - 7r(j) , (90) 

t=0 t=0 

having defined 7r(«) — Ylt=o ^A^t = ki^Pi)]- Thus, the difference in the costs depends only on 
the items observed by the different observers. Quantity 7r(0 represents the contribution to 
the total cost due to items observed by tpi, the observer that at time zero is in position i (not 
in the buffer). 

To prove that MRU is optimal for a time horizon of r + 1 under the hypothesis that it is 
optimal for any t < r it is sufficient to prove that 7r(«) < ItU) if « < j: 

Proposition 8. Let s be non decreasing: Vj G {1, F — 1} s{j) < s{j + 1). // 

Vi<T yiyj:i<j 7t(2) <7t(0 <7t(i) < l + 7t(2) , (91) 

then 



MiMj-.iKj 7^+i(2)<7^+i(i)<7^+i(j)<l + 7r+i(2) . (92) 
Proof. Base case. For t = 1 we have VA; 7t(A;) = s{k), and hence 

7*(2) < 7i(i) < 7i(j) • (93) 

Furthermore we have that 

lt{k) = s{k)<l<l + ^t{2) . (94) 

Induction. 

7^+1 (z) = s{i){l + 7r(2)) + S{i - l)7,(i) + (1 - S{i))^r{i + 1) 
< s{i){l + 7.(2)) + (1 - s{i))^r{i + 1) 

<.(0(l + 7r(2)) + (l-K^))7.(j) ^ ' 

= S{i){l + 7.(2)) + S(j)7r(j) + (1 - S{i) - S{j))^r{j) , 



7r+i(j) = s(i)(l + 7r(2)) + SU - l)7.(j) + (1 - 5(j))7r(j + 1) 
>s(j)(l+7r(2)) + (l-s(j))7.(j) 

= S(j)(l + 7r(2)) + 5(i)7r(i) + (1 - S{i) - sU)hrU) (96) 
^ 7r+l(0 < 7r+l(i) ■ 

Finally under MRU we have 

7.+i(fe) = s{k){l + 7.(2)) + S{k - l)jr{k) + (1 - S{k))jr{k + 1) 

>min{l+7.(2),7.(fc),7.(fc + l)} (97) 

=7r(fe) , 
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and 



-fr+i{k) = s{k){l + 7,(2)) + S{k - l)jr{k) + (1 - S{k))-,Ak + 1) 

< max {1 + 7,(2), 7,(A;),7,(A: + 1)} (98) 
= l+7r(2) , 

and therefore 

7r+i(A:) < l + 7r+l(2) . (99) 

□ 

Proof of Thm. O Thm. [7] follows directly from Prop. [8j □ 

8 On Bias Optimality 

Bias optimality is a stronger property than average (gain) optimahty, since it also takes into 
account the cost minimization in transient states of the dynamical system (whereas average 
costs are insensible to policy changes in transient states, provided the set of recurrent states 
stays unchanged). Bias optimal policies are characterized as solutions of the Bellman equation 
(see Prop. [T]). In this section we provide evidence of the hardness of the general solution of the 
Bellman equation in two ways: 

• We prove that bias-optimal policies in general do not satisfy the inclusion property (the 
same result also applies to optimal policies in a finite horizon). 

• We derive the complex solution of the Bellman equation for the relatively simple case of 
C = 2. 

Theorem 9. There are systems for which the unique optimal policy over some finite horizon 
and bias-optimal over infinite horizon is not a stack policy. 

Proof. We will exhibit a counterexample of a distribution s that has optimal policies not 
induced by a priority (and hence not a stack policy) . More in detail we first obtain by dynamic 
programming (executed by a computer program) the finite horizon optimal policies for two 
different buffer capacities C and C" (being C < C"). Starting with buffers that satisfy the 
inclusion {Bq{C') C Bq[C")) we show that exists a temporal horizon r and state positions j' 
and j" such that, when in a state with both positions filled, for C = C the optimal policy 
evicts at j', whereas for C = C" it evicts at j" . By solving (by a computer program) the 
Bellman equation associated to the system we also prove that a similar situation applies to 
the infinite horizon case, implying that the unique bias-optimal policy in infinite horizon does 
not have the inclusion property. 

Consider the following s distribution, with F = 8 and (3 = j^. 

s{j): I /3 I 3/3 I 3/3 I I 4/3 I I I 5/3 I 
j£[l,V] 1 2 3 4 5 6 7 8 
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We are given an initial LRU stack Aq and we consider the following initial buffers Bq(2) and 
Bo{3), satisfying the inclusion property -Bo (2) C -Bo (3): 



Bo(2) = [Ao(l),Ao(4)] 



Bo(3) = [Ao(l),Ao(4),Ao(7)] 



(100) 



If an access arrives at xq = Aq{8) a miss occurs in both buffers, and hence an eviction is 
needed. By computing the optimal policy for a time horizon of T = 5 we see that 

• for C = 2 the (unique) optimal eviction is at depth 2 (Ao(l)), 

• for C = 3 the (unique) optimal eviction is at depth 5 (Ao(4)). 
After the optimal evictions the two buffers become 



hence violating the inclusion property. The same eviction choice is given by the solution of 
the Bellman equation in infinite horizon, proving that bias-optimal policies are not, in general, 
stack policies. (Intuitively, the inclusion property violations happens because having in the 
buffer A(8) decreases the "profit" of having A(5): in fact a possible subsequent access to A(8) 
brings to a new state with high instant cost g, since it has in the buffer the low profit item 
A (6); whereas when C = 2 the same access causes a miss, thus enabling the eviction of the 
poorly profitable A(6).) □ 

8.1 The Bias-Optimal Policy for C = 2 

When C = 2, the state of our dynamical system can be identified by the unique index 
j G {2, . . . ,V} such that the buffer contains the items in positions 1 and j of the LRU stack. 
The Bellman equation becomes 



h{j) = l-s{j)+h{2)-X+S{j-l)mm{0,h{j) - h{2)} + {l-S{j))mm{0,hij + 1) - ^2)} 



The /i(j)'s are defined up to an additive constant, so we can set h{2) = to simplify the 



Kj) = 1 - s{j) - A + S{j - 1) min{0, h{j)} + (1 - S{j)) min{0, /i(j + 1)} . (104) 



5i(2) = [Ao(8),Ao(4)] = [Ai(l),Ai(5)] , 

Si (3) = [Ao(8),Ao(l),Ao(7)] = [Ai(l), Ai(2), Ai(8)] 



(101) 
(102) 



(103) 



equation: 




J*{j) - J*{2). We now "guess" the 



I3{j) = m^s{j,j + 1-1) , 

$(j) ^ {/ > 1 : VA; G {0, . . . , Z - 1} s(j + + Z - 1) > /3(2)} U {O} 



(105) 





where $(j) is the subsequence of items that are visited when applying the policy induced 
using /3 as a priority and ^(j) the length of this subsequence. 
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Proposition 9. Bellman equation (IO4) is solved using the following A and h{j): 



A = l-/3(2) , 

Sii - I) (106) 
h{3) = m - sij) - ^_^g^.J^^ Hj)p{j) - ^{j + 1)P(J + 1) • 

Proof. Let V'(j) - i-s{j-i) ^ 

h(i) = /-^(-^XjXj') p^^'^ > Hj) > 0) .^Q^. 

\/3(2)-s(j)-0(i + l)p(j + l) P{3)<0{^m = 0) 

This implies min {0, = — V'O)p(j)0(j)- Using this term we can see that the chosen A 
and h{j) satisfies (|104[). □ 



9 Conclusions 

In this paper, we have revisited the classical eviction problem, relating it to optimal control 
theory and introducing the average occupancy variant, which provides solutions and insights 
even for the classical, fixed occupancy version of the problem. 

A number of interesting and chahenging issues remain open in the area of eviction pohcies 
for the memory hierarchy. One objective is the search for optimal policies (or policies with good 
performance guarantees), with fixed occupancy, for general HMRM traces. In this context, it 
may be worthwhile to investigate the Least Profit Rate policy beyond the LRUSM model. 

Within the LRUSM, we have considered policy design assuming a known stack-depth 
distribution: what performance guarantees can be achieved if the distribution is not known 
a priori, but perhaps estimated on-line, is another intriguing question, whose answer may 
have practical value for memory management in general purpose systems, where different 
applications are likely to conform to different distributions. 

In this work, we have also explored forms of optimality different from gain optimality. 
However, even within the LRUSM, we lack general solutions for a finite horizon as well as for 
infinite horizon, if we insist on bias optimality. 

A question underlying the entire area of eviction policies remains the choice of an appropriate 
stochastic model for the trace. While the LRUSM captures temporal locality in a reasonable 
fashion, it completely misses spatial locality, a property critically exploited in hardware and 
software systems. Spatial locality implies that certain subsets of the addressable items occur 
more frequently in short intervals of the trace than other subsets. On the contrary, the LRUSM 
is invariant under arbitrary permutations of the items. The Markov Reference Model can 
capture some level of temporal and space locality, for example if the transition graph contains 
regions where outward transitions have low probability, thus corresponding to a sort of working 
set. However, in real programs, the same item tends to occur in different working sets at 
different times, that is, the same item can be accessed in different states of the trace, so that 
the state cannot be identified with the last item that has been accessed, as in the MRM (see 
also f33] for evidence on the limitations of Markov models). Suitable hidden Markov models 
do not necessarily suffer from this limitation, which motivates further investigations of optimal 
policies for the general HMRM. 
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A Bellman Equation 

Let A be a discrete dynamical system and X, W and Q denote respectively its state, disturbance 
and control spaces; let V be the set of all the admissible policies to control the system. 

Definition 6. A system is said to be of type 1 (the standard model) if the policies are allowed 
to choose the control u at time t only as a function of the state at the same time: 

ut = ii{xt)^U{xt) , (108) 

where 

U : X ^ ^{Q) , (109) 

and 

fieV: X^Q . (110) 

Definition 7. A system is said to be of type 2 (our model) if the policies are allowed to 
choose the control u at time t as a function of the state and the disturbance at the same time: 

ut = fx{xt,wt) eU{xt,wt) , (111) 

where 

U : X xW^ ^{Q) , (112) 

and 

HeV : X xW . (113) 

Definition 8. A system A = {X,W,Q,g, f,U{-,-)) of type 2 is said to be equivalent to a 
system A' = {X', W, Q', g', f, U'{-)) of type 1 if and only ii X = X',W = W, g = g' and 

"^xeX yweW , , 

(114) 

3n G ?7 {x, w) : f{x, u,w) = y 3u E U (x) : f {x,u ,w) = y . 

Remark 4. Given an initial state xq and a realization of wt for two equivalent systems A 
and A', and a sequence of controls ut for system A is always possible to find u[ such that the 
state trajectories xt (and hence the costs) of the two equivalent systems are the same. 

Lemma 5. For every system A of type 2 exists a system A' of type 1 s.t. A and A' are 
equivalent. 

Proof The proof is obtained by choosing as controls for A' the policies of A. Let 

X'^X, W'^W, g' = g, Q' = r, \fx e X U'{x)^V (115) 
f' {x,u = f {x,u {x,w),w) , {since u'^V) ■ (116) 

Then Vx G X, Vu; G W, 

• E U {x, w), let y = f {x, u, w). If we set u' such that u'{x, w) = u we have 

f {x,u',w) = f {x,u{x, w),w) = f {x,u,w) = y . (117) 
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• \/u' S U'{x), let y = f {x, u' , w). If we set u = u'{x, w) we have 

f {x,u,w) = f {x,u{x,w),w) = f {x,u,w) = y . (118) 

□ 

Optimal cost update equations. Let wt be a random process with values in W, i.i.d. 
for different t's. Let A be a system of type 2 and A' an equivalent system of type 1. Then the 
cost update equation for A can be written as: 



WxeX J*{x)=Ey 



min {g{x,w) + J*_i{f {x,u,w))} 

udU {x,w) 



(119) 



j; = Tj;_i , (120) 

whereas the same equation for A' is: 

"ixeX J*{x) = min {E^[g{x,w) + J*_i{f' {x,u',w))]} , (121) 

j; = T'j;_i . (122) 

Remark 5. Since equivalent systems can reproduce each other's state evolution, their optimal 
costs are the same, in particular 

T = r . (123) 

We recall the classical Bellman equation theorem: 

Theorem 10 (Standard Bellman equation). Given a dynamical system A' of type 1, if3X 
and 3h such that 

Xl + h = Th , (124) 
then A is the optimal average cost of A and h are the differential costs of the states, i.e. 

Vx,yeX lim j;{x) - J*{y) = h{x) - h{y) . (125) 
We are now ready to prove our version of the Bellman equation for a system A of type 2: 



Proof of Prop. [T} Consider a system A' equivalent to A. Applying Thm. 10 we have that, 
if we can solve Bellman equation for A' then we have found its optimal average cost and 
differential costs vector. Since the two systems are equivalent this implies that they are also 
the corresponding costs of A. Hence we have 

3X3h : XI + h = T' h ^ X and h costs for A , (126) 

but since T' h = T h we finally have 

3X3h : XI + h = T h ^ A and ft. costs for A . (127) 

□ 
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